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Abstract. In this paper we perform a statistical analysis of the high-frequency re- 
turns of the Ibex35 Madrid stock exchange index. We find that its probability distri- 
bution seems to be stable over different time scales, a stylized fact observed in many 
different financial time series. However, an in-depth analysis of the data using max- 
imum likelihood estimation and different goodness-of-fit tests rejects the Levy-stable 
law as a plausible underlying probabilistic model. The analysis shows that the Nor- 
mal Inverse Gaussian distribution provides an overall fit for the data better than any 
of the other subclasses of the family of the Generalized Hyperbolic distributions and 
certainly much better than the Levy-stable laws. Furthermore, the right (resp. left) 
tail of the distribution seems to follow a power-law with exponent a w 4.60 (resp. 
a s» 4.28). Finally, we present evidence that the observed stability is due to temporal 
correlations or non-stationarities of the data. 
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1. Introduction 



The marginal distribution of returns of financial assets have been placed under 
scrutiny since the times of Bachelier [H], and the idea of treating log-returns as inde- 
pendent identically distributed Gaussian random variables lies in the core of the most 
well-known and celebrated financial models [9 37 and so it is crucial for derivative 



pricing and risk management. And even though this idea works fine as a first approx- 
imation, it is well documented that empirical financial data drawn from very different 
markets, time periods and instruments do not fit the Gaussian model [l5|[20|[33|[38] . 
The empirical distributions of log-returns present tails heavier than Gaussian as well 
as many other non-trivial statistical properties collectivelly know as stylized facts [14] 
that place the Gaussian hypothesis in jeopardy and point towards a possible universal 
behavior of the underlying processes. 

One of the most celebrated of these stylized facts is the scaling symmetry or stability 
of the distribution of log-returns, i. e. its invariance under aggregation up to rescaling. 
For independent identically distributed random variables the Gaussian law is the only 
distribution with finite second moment that has this property, and that is why the 
central limit theorem singles it out as the limiting distribution of rescaled sums of 



i.i.d. random variables with finite variance 11 . To observe stability in distributions 
other than Gaussian the requirement of a finite second moment has to be dropped. This 
seminal idea was first expounded in finance by Mandelbrot [33] who proposed the family 
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of Levy-stable laws as an alternative to the Gaussian model of log-returns. One feature 
of these probability distributions is the divergence of their second moment caused by 
the power-law behavior of its tails with characteristic exponent a < 2. Considering 
a financial market as a complex system of interacting agents in which prices are the 
outcome of many independent individual decisions, according to the generalized central 
limit theorem its limiting distribution should be a member of the Levy-stable family 
— of which the Gaussian distribution is just a special case — with the normalization 
constant depending on the tail index of the power-law 



11 



Therefore, it seems natural to investigate the tails of the distribution of log-returns 
in order to shed some light on its stability properties. However, this is a moot point: 
although some authors 20 33 have reported power-law behavior with a < 2, others 
have reported distributions with power-law tails far away from the stability regime 
|21l|23l|29[|30[40 . It could be argued that there is an endemic arbitrariness of the least- 



square regression used to study the power-law behaviour in empirical data, but that 
problem could be overcome replacing it with maximum likelihood estimation together 
with goodness-of-fit techniques 
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This fact notwithstanding, it is also well known 
that an estimated tail index above two is not an evidence against stability: it could well 
have been produced by a stable distribution with a as low as 1.65 with the situation 

And to round this off, it is not 



16.36 



getting worse as we approach the a = 2 limit 
only difficult to discriminate between different power-laws or even between stability or 
the lack of it; the sole task of distinguishing a power-law from a stretched exponential 
is still subject to debate 32 , since certain empirical distributions of log-returns seem 



15,24,28 



to decay asymptotically slower than any power-law 

Among the distributions with tails lighter than power-laws, a family that has been 
used with success to model log-returns are the Generalized Hiperbolic laws. The em- 
bryo of this family of distributions is the Hyperbolic distribution, first proposed by 
Ralph Alger Bagnold (6) to model the size distribution of the wind-blown sand. Later, 
Barndorff-Nielsen — still with the problem of the distribution of particle size in mind — 
generalized it to the family of Generalized Hyperbolic {GH) distributions [7], of which 
the hyperbolic distribution is a special case. Different subclasses of this family have been 
since then proposed as alternatives to both Gaussian and Levy-stable laws as statisti- 
cal models of financial returns, namely, the Skewed Student's t distribution 10,41 , the 



Hyperbolic distribution by Eberlein and Keller 19l 27 , the Variance- Gamma of M adan 



and Seneta 31 and the Normal Inverse Gaussian (NIG) by Barndorff-Nielsen [8]. One 
of the most appealing properties of this family is its tail behavior, which is a power-law 
modulated by an exponential. These lighter tails seem well suited to fit the empirical 
distributions of log-returns as the studies cited above show, since the data seem to have 
tails heavier than Gaussian but still lighter than the Levy-stable laws. 

As can be inferred, the question of the true nature of the distribution of log-returns 
(or even its tail behavior) and the origin of its apparent stability is far from being 
settled, and therefore, the study of diverse financial time series drawn from different 
instruments, markets and time periods is pertinent in order to shed some light on this 
issue. In this paper we will carry out a thorough study of the distributional properties 
of the high-frequency log-returns of the index Ibex35 from the Madrid Stock Exchange. 
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After reviewing the basic properties of both the Levy-stable and Generalized Hyperbolic 
laws in Section |2j we will perform a series of fits of these families to the observed log- 
returns as well as different statistical tests to quantify their goodness-of-fit (Section 
[3]). There, we will also study in close detail the tail behavior of the data in order to 
elucidate its possible stability and we will address the question of the scaling symmetry 
of the data. In Section [4] we will sum up the results of the previous section and we will 
confront them to those obtained in other studies. The conclusions will be expounded 
in Section |U 

2. Levy-stable laws and GH distributions 

2.1. Levy-stable laws. Levy-stable laws do not have a closed analytical form for its 
probability density function in general, but they can be readily defined in terms of their 
characteristic function fit): 

(1) <p(t) = exp[itfM - \5t\ a (l - i/3 sgn(t)$)] 



(2) $ 



tgf ifa^l 
log \t\ if a = 1 



The characteristic exponent a G (0,2] determines the weight of the tails and the 
skewness parameter (5 € [ — 1, 1] its asymmetry. The parameters \i and S are its location 
and scale parameters respectively. The Gaussian distribution is a special case of this 
family with a = 2, /3 — 0. A random variable X is the limit in distribution of normalized 
sums of i.i.d random variables if and only if X has a Levy-stable law [11]. 

2.2. Generalized Hiperbolic laws. The Generalized Hyperbolic distribution can be 



parametrized in several ways. Following Prause 42 , its probability density function 
can be written as: 



(3) /(*; A, 6, a, /i, 0) = J?^' - 



1/2-A 



where 7 = \J a 2 + f3 2 and Kx-1/2 is the modified Bessel function of the third kind 
with index A — 1/2. The parameter a > determines the shape of the distribution 
and < |/3 1 < a its skewness. The usual location and scale parameters are \i and 
5. The parameter A characterizes certain subclasses and influences the size of the 
mass contained in the tails. These distributions can be thought as mean-variance 
mixtures of Gaussian distributions where the mixing distribution is the Generalized 



Inverse Gaussian distribution 18 



Letting A = — ^ we obtain the Normal Inverse Gaussian distribution (NIG), whose 
probability density function is: 
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aSKi [a^S 2 + (x — /x) 2 
(4) fix) = \ ^. e W(«-M) 

All its moments are well defined since it decays as x a e~^ x . The NIG distribution is 
the only subclass of the GH family which is closed under convolution; this fact greatly 
simplifies the computations for option pricing 
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Letting A = — | and a — > \/3\ in the formula[3 above, we obtain the GH skew Student's 
t distribution. Its density is given by: 

2^8 v \^K^+i (V/? 2 (5 2 + (x-/i) 2 )) e^*-") 

(5) /(x) = 2 ; 

r(f)v / ^(v / ^ + (^-^) 2 ) 2 

This is the only GH subfamily with different asymptotic behaviour of its density 
function: one tail is a power-law with characteristic exponent equal to — v/2 — 1 and 
the other is a power-law with exponent —v/2 — 1 modulated by a factor e -2 '^"^. If the 
asymmetry parameter /3 is zero, we recover the classical Student's t distribution with 
symmetric and power-law tails, with a well defined second moment for v > 2 [1]. 

3. Analysis of the data 

3.1. The data. Our data set contains the price ticks of the index Ibex35 of the Madrid 
Stock Exchang^] The index Ibex35 is a weighted index formed by the 35 most liquid 
Spanish stocks traded at the Madrid Stock Exchange. The data set covers the period 
from January 2nd 2009 to December 31st 2010 and comprises 510 market days. 

The values of the index are not updated evenly; the records oscillate between two 
and around twelve seconds. In order to have a well defined time interval we have 
sampled these ticks in fifteen-seconds intervals obtaining a series with 1036321 records. 
From this time series we have obtained the log-returns. However, some issues had to 
be taken into account before doing this. First, we have to discard the discontinuity 
created overnight to avoid artifacts: we therefore focus exclusively on intraday returns. 
Second, there is a 30 second uncertainty in the closing time of the session in order to 
avoid arbitrages: we have accordingly taken a security margin finishing our sessions at 



17:29. Some authors 32 have also pointed out that the volatility pattern present during 
the day (the "lunch effect" ; see Figure [I]) should be taken into account by normalizing 
each return with the average absolute return of that time of the session. However, as 
happened in the study cited above, in ours we have not observed substantial differences 
between the raw and the normalized data; therefore, we have worked exclusively with 
raw returns. The final return series contains 1035810 records, with 2031 records for each 
market day (Figure [2]). The sample statistics can be found in Table [I] once normalized 
in scale and location. 



Data obtained from www.tickdata.com. 



STATISTICAL ANALYSIS OF THE IBEX35 INDEX 



5 




time 



Figure 1. Lunch effect. At 15:30 CET Wall Street opens, and at 14:30 
CET and 16:00 CET macroeconomic indicators in the USA are an- 
nounced. 



MAX. min. 



a 



29.181 -28.184 0.000 1.000 -0.241 13.659 



Table 1. Sample statistics. 



3.2. Estimation of the parameters. Estimation of the parameters of all the distribu- 
tions has been accomplished using the method of maximum likelihood. The asymptotic 



properties and optimality of this method of estimation are widely acknowledged 43 



However, for the family of Levy-stable laws, parameter estimation via maximum likeli- 
hood is not straightforward due to the fact that an analytical expression of the proba- 
bility density function is not available, and therefore, the method is only applicable by 
numerical approximation which is very time consuming due to the sample size. Other 
faster possibilities include methods based on the sample quantiles [35] or regression via 
the sample characteristic function; see (46] for a survey of the most usual estimation 
methods for this family of distributions. The estimated parameters can be found in Ta- 
ble^ and Figure [3] shows the semilog plots of the estimated densities and the empirical 
histogram. 
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FIGURE 2. IBEX35 normalized logarithmic returns. 



Parameters: 


fi 


5 





a v 


A 


Levy-stable 


0.0071 


0.4825 


0.0102 


1.5358 




GH 


0.0101 


0.6495 


-0.0103 


0.6296 


-0.5352 


Student's t 


0.0101 


0.9643 


-0.0089 


2.7029 




NIG 


0.0101 


0.6365 


-0.0103 


0.6490 





Table 2. Estimated distribution parameters. 



3.3. Goodness-of-fit tests. To quantify the goodness-of-fit of the estimated distribu- 
tions three different statistical tests have been used: the x 2 i the Kolmogorov-Smirnov 
and the Anderson-Darling |4|| tests. These last two tests — based on the cumulative 
distribution function rather than on the probability density function as the simpler 
X 2 test — make a better use of the information contained in the sample since it does 
not need to be binned. Their drawback, however, is that they are much more compu- 
tationally intensive since the distribution function has to be evaluated at the sample 
points and this implies millions of numerical integrations of trascendental functions. 
Apart from this, only when the parameters are part of the hypothesis the large-sample 
distribution of the test statistic is known; for estimated parameters (like in this case), 



this distribution is not known except for a few special cases 45 . Montecarlo simula- 



tion — the usual approach to tackle this problem 46 — is not feasible here due to the 



STATISTICAL ANALYSIS OF THE IBEX35 INDEX 



7 




Figure 3. Histogram and estimated pdfs. 

sample size and to the associated computational time needed to obtain the distribution 
function. Anyway — as Anderson [3] points out — the percentage points for the tests 
when the parameters are estimated are much smaller than those obtained when the 
parameters are known: a rejected hypothesis using these latter percentage points will 
then also be rejected with an even higher confidence level when using the former. In 
any case — and as a general rule — the lower the value of the test statistic, the better 
the fit. 



Test statistic 


x 2 


K-S 


A-D 


Levy-stable 


17133.87 


0.0175 


512.17 


GH 


4556.36 


0.0147 


53.88 


Student's t 


6967.21 


0.0148 


193.73 


NIG 


4911.18 


0.0147 


52.30 



Table 3. Goodness-of-fit statistics. 



The values of the test statistics obtained for our data and the upper bounds for the 
critical points of the tests for the given confidence levels are shown in Tables [3] and |4} 
As it can be observed, the null hypothesis is rejected for any given reasonable confidence 
level for all the distributions. However, the hyperbolic distributions clearly outperform 
the Levy-stable law. 
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Confidence 


x 2 


K-S 


A-D 


5% 


231.92 


0.00134 


0.4614 


1% 


245.48 


0.00160 


0.7435 



Table 4. Critical points for the goodness-of-flt tests. 

For the members of the GH family, a likelihood ratio (A) test has been also performed. 
This will allow us to quantify which of the two subclasses {i.e. NIG or Student's t) is 
the soundest and whether or not the GH model can be reduced to one of its subfamilies. 
The values obtained for the statistic —2 log A are tabulated in Table [5]along with the p- 
values of a xf variable, its large-sample distribution under the hypothesis of asymptotic 
normality of the maximum likelihood estimators. According to this, if we accept the 
hypothesis that the data follows a GH distribution, we cannot reject with a confidence 
level of 2% the hypothesis that it in fact follows a NIG distribution, while the hypothesis 
that the data follows a Student's t distribution is rejected at any reasonable confidence 
leveS 



Distribution 


-2 log A 


p-value 


NIG 


5.49 


0.02 


Student's t 


4551.78 


< 10~ 16 



Table 5. Likelihood-ratio statistics for the GH subfamilies 



3.4. Asymptotic behavior. Since all of the usual distributions are rejected as plau- 
sible hypothesis for the data, we have also studied in detail the asymptotic behavior of 
the tails. On a log-log plot (Figure [4]) they seem to fit rather well a straight line. Using 
the methodology proposed in [13] to analyze and estimate power-laws in empirical data, 
we have obtained a value of 4.60 (resp. 4.28) for the characteristic exponent a and a a 
value of 7.76 (resp. 6.70) for the scale parameter x min for the right (resp. left) tail. 

Even though the tails seem to follow a power-law well outside the stability region, 
a robust test to reject the hypothesis of stability or even of an exponential behavior 
is needed. It is well known that for moderate sample sizes an observed tail-index well 
above two cannot be used as an evidence against stability since it is highly unreliable 
estimator; if the distribution was really stable, an estimation of the tail parameter using 
the full sample by maximum likelihood would be more pertinent [36] . 

3.5. Stability of the data. According to the results of the last paragraph and con- 
sidering the sample size, the most plausible hypothesis is the lack of stability of the 
distribution of returns. However, sampling the returns at different time scales t — from 

2 We also performed all the tests for the Hyperbolic and Variance- Gamma distributions which yielded 
even worse p-values than those obtained for the Student's t. We thus decided not to include them 
among our results. 
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Figure 4. Tails of the complementary cumulative distribution function. 



fifteen seconds up to half a day — and rescaling it with t 1 / 2 its distribution seems to 
remain stabl^] (Figure [5j top panel). This suggests that this symmetry must therefore 
be the result of the presence of long memory in the data or to the temporal dependence 
of the parameters. 

To support this facts, a reshuffling of the data has been performed (Figure |5j middle 
panel). It can be readily observed that Gaussianity is reached in a few minutes, as 
could be expected from the central limit theorem. Finally, we have also performed a 
daily reshuffling of the returns to verify if this scaling symmetry could be an artifact of 
the lunch effect. As can be observed in Figure [5] (bottom panel), the scaling symmetry 
of the data still holds, a fact that points towards long-range correlations as the most 
plausible explanation for this symmetry. It goes without saying that the long memory 
exhibited by the data and its autocorrelations deserve an in-depth study that shall be 
addresed in future work. 



The value of 1 /2 for the scaling exponent was obtained by Detrended Fluctuation Analysis, and it is 
the one expected for independent identically distributed random variables with finite second moment. 
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Figure 5. Rescaled probability distributions of the Ibex35 index ob- 
served at different time intervals (dotted line: iV(0, 1)). Top: raw data. 
Middle: reshuffled data. Bottom: daily reshuffled data. 
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4. Discussion 

As far as we know, this is the first study of the high-frequency returns of the Ibex35, 
so we will necessarily compare our results with those obtained for similar market in- 
dexes. 



Platen and Sidorowicz, in their study of several world stock indexes 39 , found that 
for the daily returns of the Madrid stock exchange the best fit was provided by a 
Student's t with 4.51 degrees of freedom, while the NIG distribution fit was not as 
sound. The same results were obtained for a broad group of world stock indexes as an 



extension of a previous study 25 . This result is in stark contrast with our findings, 
where the NIG distribution outperforms the other members of the GH family as well 
as the Levy-stable laws. As a matter of fact, according to the likelihood-ratio test, the 
five-parameter GH family could be reduced to the four-parameter NIG model without 
much loss. 

The Levy-stable distributions seem to be also very well suited to model the log-returns 
of many stock market indexes: this is the case e.g. of the daily returns of the Hong 
Kong Hang Seng index. Further, for this index the Levy-stable law provides a much 
better fit than the NIG distribution |12|. Similar results have been observed for the 
IPC mexican index: the hypothesis of stability could not be rejected at 5% confidence 
level while the hypothesis of NIG distributed daily log-returns was clearly rejected [2]. 
However, according to our findings, the Levy-stable model is not the best option for 
modeling the high-frequency returns of the Ibex35 since the NIG distribution is a much 
better candidate. 

Considering the tail behaviour of the distribution of log-returns, it is documented 



that the S&P500 index follows a power law with a ~ 3 22,23 , while the characteristic 
exponent of the German DAX lies in the range between 3 and 4 [30). This is the most 
commonly accepted range for the characteristic exponent of the tails of the distribution 



of log-returns. However, the consensus is not complete, and some authors 26 32 claim 
that a characteristic exponent in the range a £ [3,5] could be expected, and even 
that the decay could well be exponential rather than hyperbolic. We have obtained 
a seeming power-law behavior for the right (resp. left) tail of the distribution with 
exponent a ~ 4.60 (resp. a ~ 4.28), inside the accepted a £ [3,5] range. The only 
probability distribution analyzed in our study that could have this asymptotic behavior 
is the Student's t. The overall fit, however, rules it out as a plausible model. 

The scaling invariance of the financial time series was first proposed and exploited by 
Mandelbrot in his investigation of the variations of cotton prices [33). In what regards 
to stock market indexes, scaling has been observed in the high-frequency returns of 



S&P500 index 34 and in the OBX index of the Oslo stock exchange 44 among 
others. In both cases the aggregated log-returns were rescaled with being 
the estimated tail parameter for the Levy-stable fit (1.4 for the S&P500 and 1.64 for 
the OBX, values that are similar to what we have estimated (1.53)). In our study, 
however, rescaling using the estimated tail exponent for the Levy-stable fit destroys 
the symmetry; a value of 1/a = 0.5 obtained by DFA — and the one expected for finite 
second moment random variables — was used instead. 
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5. Summary 

We have performed a statistical analysis of the high frequency log-returns of the 
Ibex35 index of the Madrid stock exchange over a two year period (2009-2010). Partic- 
ular attention has been paid to describing the best probability distribution for the data 
since this question is still controversial in the recent literature, the only fact commonly 
accepted (although not yet fully incorporated into pricing models) is the departure from 
normality. Our results show that among the members of the family of Generalized Hy- 
perbolic laws the Normal Inverse Gaussian distribution is the one that provides the 
best fit for the data. Furthermore, the 5-parameter GH family could be well reduced 
to the 4-parameter NIG family without significant loss. This distribution also clearly 
outperforms the Levy-stable laws as a statistical model for this index. 

The tails of the distribution of log-returns behave as power laws with exponents 
a 4.28 (left tail) and a ps 4.60 (right tail), a fact that according to the generalized 
central limit theorem would not be compatible with the stability of the distribution un- 
der aggregation. However, the empirical distribution of log-returns has been observed 
to be stable over several time scales, ranging from a few seconds up to a few hours. 
We conjecture that time correlations among the data are probably responsible for this 
observed stability, since reshuffling the data destroys these time correlations and re- 
stores the expected convergence results predicted by the central limit theorem. A more 
thorough analysis of these time correlations shall be conducted in future work, together 
with the development of derivative pricing models that take into account more realistic 
distributions for the underlying assets. 
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